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PART 1: CONCEPTUAL MODEL 


1.1. PROBLEM DEFINITION 

Large-scale applications of remote sensing for the purpose of preparing 
crop estimates, natural resource inventories, disaster assessments, etc. for 
a given geographic region will, in general, involve questions of sampling, since 
complete coverage of the total geographic region and subsequent analysis of data 
collected with complete coverage appear technically and economically infeasible. 
This is true regardless of whether an aircraft, or a satellite is involved, and 
it applies equally to photography, multispectral measurements, radar, etc. Thus, 
even if remote sensing provided completely accurate data, estimates (of crop 
acreage, natural resources, extent of disaster, etc.) for the total region under 
study will be subject to an error, the so-called error of estimate . This error 
arises due to the fact that inferences based on selected observations within 
the region are drawn regarding the characteristics of the total region. 

It is the purpose of this discussion to present a conceptual model (in 
Part 1) and an empirical application (in Part 2) of the relationship between 
the manner of selecting observations (i.e. the sampling scheme) and its effect 
on the precision of estimates (i.e. the magnitude of the error of estimate) 
from remote sensing. Because of technical and practical considerations, a 
sampling scheme which suggests itself as being useful is a three-stage sampling 
scheme. 1 / The first stage in this scheme is flightlines, the second stage 
is segments within flightlines, and the third is units within segments. In 
general, it can be expected that the various stages contribute differentially 
to the error of estimate. Also, the contribution from the various stages to 
the error of estimate is affected by the number of observations in each of 
the stages (i.e. the subsampling ratios). For instance, an increase in the 
number of flightlines to be analyzed may be both costly and difficult to execute 
but decrease the variance of the overall estimate little. On the other hand. 


1/ The statistical concepts presented here are not new (cf [1] and [2]). 
They are merely adapted to the problem of remote sensing applications. 
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an increase in the number of segments per flightline may increase costs and 
difficulties of analysis little but may have considerable influence on the 
precision of the estimate. Thus, an arbitrary mix of number of flightlines, 
segments within flightlines, and units within segments may result in high costs 
of operation as well as poor estimates. An understanding of the effect of sub- 
sampling ratios on the precision of estimates is, therefore, important for 
most remote sensing applications, particularly those of large scale. 


1.2. PROBLEM CHARACTERISTICS AND ASSUMPTIONS 


It is assumed that remote sensing is used to estimate a population 
characteristic (such as acres of a particular crop) in a well-defined geographic 
region. The flightlines are assumed to be of equal length. Similarly, seg- 
ments 2/ within each flightline are of equal size, units within each segment 
are of equal size, and there is an equal number of units in each segment an an 
equal number of segments within each flightline. Flightline locations are 
random within the region, as are segment locations within the flightlines and 
unit locations within the segments. 3/ 

Finally, if a measurement error is present, it is assumed to be constant 
and/or is random, normally and independently distributed with a mean zero and 
a standard deviation of a £ . 


1.3. THE VARIANCE MODELS 


In order to ahcieve our objective of investigating the effect of subsampling 
ratios on the precision of estimates from remote sensing, it is necessary to 
develop the variance of the estimate in question. We shall do so for both 
measurement (continuous) and attribute (binomial) data. But first we shall dis- 
cuss the question of how the measurement error affects the variance. 


27 A "segment" is a sampling unit of specific size (i.e. 1 mile by 10 
miles) within a flightline. 

3 / If "ground observations" are used to "train" the computer or photo- 
interpreter, it is assumed to be given. That is to say, a certain 
classification accuracy is assumed, and the relationship of the amount 
of ground truth to training, and the level of training to precision 
of estimates are not explicitly considered in the statistical model 
to be presented. 



1.3.1. The Measurement Error 
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In remote sensing measurement errors are encountered due to deficiencies 
in the measuring device, deficiencies in data analysis, etc. Thus, the variance 
estimates should include a measurement error component. 


Let us assume that the relevant mathematical model for the measurement error 
present in the system under study is 


<1.3.1) 

Where 



G. + 

i 


g i 


+ e . 
ia 


ia = value of item obtained in the ath repetition, 
= true value of the item. 


constant bias. 


e. = random component, 
ia 

Since the system under study is one where each item is measured only once, 
the error (g ± + e ia ) can be combined into a single term, c ia , thus simplifying 
the model to 


(1.3.2) Y. = G,- + e. 

ia i la 


IF the above model (1.3.2) holds, IF the sample is a random sample, and 
IF we are dealing with an infinite population, then the variance of estimate 
(to be developed below) will remain valid although no measurement term appears 
explicitly in the variance definition. However, if we are dealing with a 
finite population and the measurement error is not explicitly considered, a 
biased estimate, approximately equal to a z / N will be the result (where N is 
the number of members in the population). 4/ 

In either case, the resulting variance will be the variance for the b iased 

mean. 


1.3.2. Variance of Estimate of the Mean for Measurement Data 
and Three-Stage Sampling 

The observation y, is assumed to be of the form 
ijk 

ijk i ij ijk 


4/ cf. [2], p. 305 ff. 
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where Y is the overall mean, Uj_ represents a component associated with the flight 
line and is constant for all segments within the flightline. The component 
represents a variation from segment to segment within the flightline, and w-j.^ 
represents a variation from data point to data point within the segment. The 
variates u^, v-j^ and w-j^ are assumed independently distributed with mean zero. 
The variates have variances of Sp, So, and S^, respectively (F for flightline, 

S for segment, and D for data points). The population to be studied contains 
a finite number of Np flightlines, Ng segments within each flightline, and Nq 
data points within each segment. Finally, a sample of np, ng, and n D observa- 
tions are randomly chosen for flightlines, segments, and data points, 
respectively. Then the variance of the sample mean is _5/. 


(1.3.3) T(?) ■ <W S F , (»sV"s°F ) j! + <VsWs n F> S D 

"f "f n s n f n s n F n d n s n f Vs"f 


An unbiased estimate of V(y) in (1.3.3) is obtained from the sample 
as follows : 

(1.3.4) v(y) - — ~ * S 1 + — S ~ • ^ >S 2 + ^ 

n r n c n n L N r. N 0 N N^ N 


E S D 

2 2 2 

The variances S^, , and are computed from the sample as follows 


(1.3.5) 


(1.3.6) 


(1.3.7) 


T 1 ,33.2 

4 - n s°p i (y i~ y> 

( n p - 1 > 

0 II,. « .2 

4- n ° i j (y ir y i } 

n F <n S ' 1) 

, in , - ,2 

4 =i .1 k ^iik-yj^ 

n F n S (n D -1) 


.si 
N S ' 


57 If the values of N^, N^, and N^ can be considered infinite, or, alternatively 

if the ratios of n /N , n /N and n /N can be considered negligible, the 

r r b b D L) 

finite population correction factors (f.p.cls) can be omitted and tlie 
expression for the variance of the sample mean will reduce to 


v(y) = s 


s? 


s? 

U 




n F‘ n S n F n S n D 
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Where i = 1, . . . , n 


j 1 , • . . , n 


k = 1, . . . , n 


and 




y = (S y. ,) /n„ 
1 J ij S 


y = ( i y i )/n i 


1.3.3. Variance of Estimate of the Mean for Binomial Data and 
Three-Stage Sampling 

In many remote sensing applications the analysis is such that every 
unit in the population falls into one of two classes, for example C (=corn) 
and 0 (=other). Thus: 


Number of units in C 

Proportion of units 

in C in 


Popula tion 

Sample 

Population 


Sample 


A 

a 

P=A/N 


p=a/n 



By means of a simple device it is possible to apply all of the models 
and formulas developed above to this situation. Suppose, for the moment, that 










we are dealing with a simple random sample and single-stage sampling. Define 
y^ as 1 if the observation is in C and as 0 if it is in 0. For the population 
we then obtain 


N 

(1.3.8) Y = I Yi = A 

1 


(1.3.9) and 


(1.3.10) 



A = P 
N 


y 


n 

l 

1 


n 



P 


Consequently, the problem of estimating A and P can be regarded as that of 
estimating the total and mean of a population in which every y^ is either 0 or 
1. Thus, we can start -out with the usual variance formulas in order to develop 
the variances for proportions. Without actually developing them 6 / , we write 
for the population 


(1.3.11) 


V (p) 


N-n 

N-l 


p_a 

n 


Q = (1-P) 


and for the sample (assuming a finite population) 


(1.3.12) v (p) = _ (N-n ) p q . = Q * 

N (n-l) ’ q U P; 


In order to transform (1.3.5), (1.3.6), and (1.3.7) into formulas which 
are useful for subsampling for proportions, let us proceed as follows: 

Let a^ . = E yijk> when y^j^ is either 


zero or one, depending on whether it falls into 0 ("other") or C("corn") , then 


6/ See Cochran [2], p. 32 ff. 



(1.3.14) 

y± 

4 p i 

= (2 Pij ) /n s 

j ! 

Compare 

V definitions 

^ immediately 

following 
1.3.7 

(1.3.15) 

7 

4 P = 

a P ± ) /n F 
i 


Then 

(1.3.16) 

2 

n n 
S D 

2 (Pi = P) 2 
i 

S 1 

(n 

F- 

(1.3.17) 

2 

S 2 

E E 

n . . 

= P i J 

(pij - pi) 2 


n F 

(n s - 1) 

(1.3.18) 

2 

_ n D 

E E p q 

S 3 

n F n S 

(n i>-l ) 1 J J j 



“ (1 - 



Substituting the above definitions into (1.3.3) - or (1.3.4) - will yield 
the desired variance, v(p) . 


1.4 PREDICTION OF THE VARIANCE OF ESTIMATE FOR VARIOUS 

SUBSAMPLING RATIOS 


We not only desire to evaluate the precision of estimates for a given 
sampling scheme, but we are perhaps even more interested in sampling and sub- 
sampling ratios which are different from those that have been used hitherto. 
This information is important for planning future experiments and applications 
of remote sensing on the same type of population. 
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From the model in (1.3.2) we can predict the variance of y for different 
sampling and subsampling ratios. JJ 

Suppose in the initial experiment we had values of n , n^ , and n^, 
respectively, then the variance was 


(1.4.1) 


0 2 2 

S_ s 
F s 


V(y) = 


+ Vs + Wd 


I * I 

If these values are changed to n^, n^ , and n^, respectively, the variance of 
the sample mean becomes 


(1.4.2) 


2 2 2 
. = S F S c S n 

v (y) = — + t — — + - ,—r — 

n F n F n S n F n S n D 


2 2 2 

In order to utilize this approach, sample estimates of Sp, Sg, and 
are required. These may be obtained from the analysis of variance of the 
sample data as shown in Table 1.4.1 for measurement data. Each of the 
variance componenets S^, S^, and can be estimated from its mean square and 

the one just below. 8/ For example 

2 2 

.2 - 


7J In the interest of expediency we shall omit all f.p.c.'s from (1.3.3) 
whenever it is being used in the following discussion. It should be 
noted that omission of the f.p.c.'s merely results in more conservative 
variance estimates. 

8/ In practice, variance components may turn out to be negative either 

because the model employed is not relevant or because of the nature of 
the sampling distributions of variance components (cf [3] and [5], 
p. 194 ff). 
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Table 1.4.1 Analysis 

of Variance for 

Three-Stage Sampling. 


Source of Variation 

Degrees of 
Freedom 

Mean Square 

Estimate of - 

Between flight lines 

(n F -D 

s?“ n S n D 1<M )2 
<n F -D 

2 2 2 
S>S>„nS 
D D S S D F 

Between segments 
within flight lines 

n F ( n s~i) 

0 El,- - ,2 

a?- ”D i 1 <y i.fV 

W U 

S D 

Between data points 

n F n S (n D _1) 

i i k 

* 2 o 

within segments 

3 Vs '"D- 1 ' 


While the above discussion utilizes expressions (1.4.1 and 1.4.2) which 
relate to measurement data, a translation to binomial data can readily be 
made on the basis of discussion in Section 1.3. The relationships in (1.4.1 
and 1.4.2) hold, only the computational procedures changes. 


1.5. OPTIMAL SAMPLING AND SUBSAMPLING FRACTIONS 

These depend on the relationship expressed in (1.3.3) or (1.3.4), 
respectively, as well as on the cost function relevant to the system. The 
following cost function is proposed: 9/ 


C = 


V n F + 


C S’ n S -n F + C D' n D’ n S ,n F + 


(1.5.1) 


VW n F + WV n F 


9/ This is a highly simplified cost function and should be considered as 
being illustrative only. 
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Where 

C = total cost ($) of collecting and analyzing data 

Cp = cost of flying a flightline of a given length and width 

c = cost of collecting data over a segment of a given length and 

b width 

c 

D = cost of analyzing a data point of a given length and width 

Sp = cost of storing the (analyzed) data point 

r^ = cost of retrieving the results from a data point. 

For a given authorized total cost (= available budget) we desire to 
select values for n p , n , and n^ such that V(§) (or V(p) is minimum. This is 
a problem of constrained minimization, and we shall write 

(1.5.2) V (y) + A (C-c p .n F -. . .-r D .n D .n s .n F ) = 0 


where A = Lagrangian multiplier 


or, substituting (1.3.3) into (1.5.2), 

o2 


(1.5.3) 


( V n P 

N„ 


+ . . .+ 


N D N sWV n F 

Wf 


n D n S n F 


+ A (C- . . . . ) = 0 


Differentiating (1.5.3) with respect to n p , ag , and n D , respectively, and 

setting the resulting equations equal to zero will result in a set of three 

equations which, when solved, will yield the optimum values in np Q , ng t , 

and n n . Sample estimates for s£, s£ and S£ will have to be uied. Their 
Dop t r F S D 

computation is discussed in Section 1.3. 


By solving the set of equations resulting from a differentiation of (1.5.3) 
repeatedly for different values of C, a performance function may be traced out, 
showing the relationship between the magnitude of the variance and an ever 
costlier data collection scheme. It is hypothesized that this relationship 
will have the general form of a hyperbola (See Figure 1.4.1). The area above 
the performance function (in Figure 1.4.1) may be termed the "irrational region", 
since an improvement can always be achieved for a situation such as represented 
by point A in Figure 1.4.1, for a given cost, C, be rearranging the subsampling 
ratio so that a movement out of the "irrational region" onto the performance 
function occurs. The result will either be a smaller sampling error, V(y), 
for a given cost, C, (a downward movement onto the performance function) or a 
lower cost, C, for a given size sampling error, V(y), (a leftward movement 
onto the performance function. 



Sampling Error V(y) 
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Figure 1.4.1 
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PART 2. EMPIRICAL ESTIMATES 


2.1. OBJECTIVES 


In Part II of this paper, an empirical evaluation of the precision of 
remote sensing estimates of the "acreage of com in a given region" will be 
developed. The effect of various subsampling ratios on the precision of 
estimates will also be investigated empirically. 


2.2. THE DATA 


2.2.1. Site of the Experiment 

The site of the experiment from which the data are taken is that of the 
"Intensive Study Area" of the 1971 Com Blight Watch Experiment (CBWE) . This 
area is comprised of the western-most portion of the state of Indiana, a region 
which is approximately forty (40) miles wide (in an east-west direction) and 
extending over the entire (north-south) length of the state (see Figure 2.2.1). 

2.2.2. Source of Data 


The data used for deriving empirical variances of estimate of com acreage 
are the multi-spectral scanner data 10 / collected on Mission 43 M of CBWE. 

(See Appendix Table and also Table 4, Appendix E, Multi-spectral Data Reliability 
Analysis, [4]. These data were collected over thirty (30) randomly selected 
segments. Each of these segments was approximately 1 mile wide and 10 miles 
long. Data for all segments were collected with identical instruments and 
identical techniques. However, data collected over fifteen of the segments were 
analyzed by the University of Michigan and its data analysis techniques. The 
other fifteen segments were analyzed by LARS and its data analysis techniques. 

The location of the "Michigan Segments" and "Purdue Segments" is shown in 
Figure 2.2.1. 


10 / Photographic data are also available for this site and could have been 
used . 
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2.3. EVALUATION OF DATA 


2.3.1. Editing of Data 

The data were first examined for consistency. As a result, segments 210, 
226, and 228 were eliminated from further analysis. 

(a) Segment 210 was eliminated because its area of 7.2 square miles 
was considerably smaller than the planned 10 square miles for 
each segment. 

(b) Segment 226 was eliminated because its area of 14 square miles was 
considerably larger than the planned 10 square miles for each segment. 

(c) Segment 228 was eliminated because its "planimetered acres" were 
considerably lower than those for other segments which had a 

smaller stated "segment area" (in sq. miles), an obvious inconsistency. 
(Further examination of this particular segment revealed that the 
segment was an island in the Wabash River.) 

2.3.2. Testing for Differences between Michigan Segments and Purdue Segments 

The original intent was to utilize data from the 27 segments (30 minus 
those three eliminated because of inconsistencies). However, because of the 
differences in analysis techniques there was reason to hypothesize that the data 
from the Michigan and the Purdue segments have to be viewed as coming from 
different populations. Therefore, it was necessary to perform appropriate tests 
before pooling the MLchigan segments with the Purdue segments. This was 
accomplished by testing independently for differences in proportions and differ- 
ences in variances between the Michigan and the Purdue segments 11 / 

(a) Difference between Variances of Estimate: The hypothesis tested was 


H 

o 



0 


p 


a = .05 


F (l-.5a ) (12,13) 


1/F. = 1/3.569 - .280 

(.5a) (13,12) 


( .5a) (12,13) = 


11/ This represents a relatively weak test. However, as will be seen below, 
the test did distinguish between the two sets of data. Thus, a more 
complex and powerful test would have added little for the purpose at hand. 


15 - 


where 


cr^ = variance of estimate for Michigan segments 

= variance of estimate for Purdue segments 

2 

v w = sample estimate of a.. = .000283 

M M 

2 

Vp = sample estimate of a p = .000721 


Therefore , 


F = = .000721/. 000283 = 2.547 

P M 


Since 




it is not possible to reject H : a = . 

o M P 

(b) Difference between Proportions : The hypothesis tested was 

V "m = n F 

a = .05 

3(.5a) = + 1.96 


Where 


n M = proportion of area in com in Michigan segments 

Up = proportion of area in corn in Purdue segments 

P^ = sample estimage of JI^ = .1514 

Pp = sample estimate of JIp = .2900 


Therefore , 


Z = 


P ^ P 
M P 

°(P -P ) 
v M P' 


.1514 - .2900 

.0009 


.1386 

.0009 


= " 462.0 


°<w 


n(i-n) ( 


M 


1 


+ — ) - .0003 

n P 


where 
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and 


n = 


EE 

mi 


EE 

y y 

im + PJ JP 

+ n F 


1,502,444 = .2067 

7,268,439 


where n M = number of data points in Michigan segments 

Hp = number of data points in Purdue segments 

= value of ith observation in mth Michigan segment 
value of jth observation in pth Purdue segment 


n M = Up must be rejected. 


Since 


I . 

JP 


a > a 


(.5a) 


the hypothesis H^: 


2.3.3. Further Examination of the Difference between Michigan and Purdue 
Segments 

Rejection of the H Q : n M = n p necessitates the conclusion that the multi- 
spectral data from the Michigan and the Purdue segments may not be pooled for 
purpose of this analysis. However, before proceeding with separate analysis 
for either the Michigan or Purdue segments, it is important to examine whether 
II differs from lip because at differences in analysis techniques or because 
of true differences in the proportion of land in corn in the areas where the 
two sets of segments were located. If the latter is the cause for the difference, 
then neither set of segments alone is useful for producing estimates for the 
entire Intensive Study Area. 

To exa mi ne this question, "ground observations" for each set of segments 
were compared to each other as well as to the multi-spectral data of the 
respective set of segments. While no formal statistical tests .were made, data 
in Table 2.3.1 indicate that estimates from "ground observations" agree well 
with estimates from multi-spectral data for the Purdue segments. However, a 
substantial downward bias appears to be present in the multi-spectral data for 
the Michigan segments. Therefore, only Purdue segments will continue to be 
used in the following analysis . 


♦ 
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Table 2.3.1. Comparison of Estimates from Ground Observations with Estimates 



from Multi-spectral Scanner 
Purdue Segments 

(MSS) 

Data 

for the Michigan and 



Michigan Segments 

1 Purdue Segments 

Source of 
Es timate 




P 

Confidence 
v(p) Interval* 

cv(%) 

Ground 

Observations 

.2377 

.001108 

14 

.2745 

.001739 

15 

MSS Data 

.1514 

.000283 .1159 -.1875 

11 

.2900 

.000721 .2328-. 3472 

9 

*P + t.05 

V v(p) 







2.4. THE VARIANCE OF THE ESTIMATE 


2.4.1. Delineation of Flightlines 

The segments in the Intensive Study Area were not selected on a flightline 
basis. Instead, they were selected on a random basis. In order to permit an 
analysis of the effect of different subsampling ratios on the precision of the 
estimate from a t'nree-s tage sampling scheme (flightlines, segments within flight- 
lines, and data points within segments), hypothetical flightlines had to be 
constructed from the available data. Such construction of hypothetical flight- 
lines assumes that "movement" of segments onto flightlines will not destroy 
the validity of the data. 

Figure 2.4.1 shows that three (hypothetical) flightlines were used. This 
figure also shows the necessary "movement" of each of the four segments per 
flightline into the respective flightlines. 


2.4.2. Computation of the Variance 

The computation of the variance of estimate and the variance components 
for each of the stages follows the procedure which is described elsewhere 
(see [1J). 12 / The results are summarized in Table 2.4.1. 


12 / Minor modifications were made to account for variability in the number 
of data points per segment. 
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Table 2.4.1. Analysis of Variance for a Three-Stage Sampling Scheme in Remote 
Sensing (Three Flightlines - n^ = 3; four segments per flightline 
-ng = 4; 199,675 data points per segment -n^ = 199,675). 


Source of Variation 

Degrees of 
Freedom 

HKSIbIH 

Estimates of - 

Between flightlines 

2 

1,558.5 

^s”/f 

Between segments within 



2 2 

flightlines 

9 

3,726.6 

S D + VS 

Between data points 




within segments 

2,396,101 

.2005 

s l 


In this experiment the f.p.c. cannot be ignored. Therefore, the variance 
of estimate follows directly from (1.3.4) and (1 . 3. 13)- (1. 3. 18) . Given 
Np = 44, Ng = 26, and Np = 31,948 x 10^, then the sample value of the variance 
of estimate is 13 / 


v(p) = .0006957. 


2 2 c 

The variance components Sp, Sg, and 0 can now be estimated from each mean 
square and the one just below. However, g2 turned out to be negative. 14 / 

If it can be assumed that observations within flightlines are random samples 
from a normal population, then a test on the intraclass correlation coefficient, 
H o :Pj- = 0 becomes equivalent to H Q : sjj = 0. Such a test was executed as follows 
(cf. [5], p. 194ff . ) : 



a = .05 


F .95 (2,9) 
F - KSlL- 

i in 


= 4.26 

1.558.5 

3.726.6 


.4182 


13 / Had the f.p.c. been ignored, v* (p) =■ .0006504. Compare this to v(p)= .00072 
(Table 2.3.1) where v(p) was computed under the assumption of a simple 
random sample and where the f.p.c. was ignored. 

14/ This "is not only possible but likely in a design such as this." 

See [3]. 
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Since F < F.^ ^ 9)’ H ° cannot be rejected. Therefore, the variance 

components utilized in the subsequent analysis are as follows:. 


S 2 S = .0187 

S 2 = .2005 


2.5. EFFECT OF SUBSAMPLING RATIOS ON 
PRECISION OF ESTIMATES 


The formula (2.5.1) was evaluated for various values of np, n s , and n D> 
The results are shown in Figures 2. 5. 1-2. 5. 3. 15 / 


(2.5.1) v(p) = (; 


n T 




2 1 

s:; + ( — - 

F n„n 


F S 


,i F N S > ^ + Vk' Wd ““ 


Perhaps the most striking observation is that collection and analysis of 
a large number of data points within segments does not improve the precision 
of estimate in this particular application. While on the average nearly 200,000 
data points were actually analyzed in the experiment, our calculation shows 
that this did not improve the precision of estimate over that which is derived 
from n'jj = 50,000, given certain values of n'p and n'g. Indications are that 
a considerably smaller number of observations within segments would be satis- 
factory (see Figure 2.5.1). 

2 

Because Sp turned out to be zero in this analysis, the graphs in Figures 
2.5.2 and 2.5.3 are merely mirror images. However, both graphs show that the 
gain in precision of estimates levels off relatively quickly, and the collection 
of even more data - unless without cost - is likely to become uneconomical 
rapidly. 


15/ 


For an explanation of the underlying rational and a definition of 
variables see Part 1 of this paper, in particular equations 1.4.1 and 


1.4.2. 
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Figure 2.5.1 
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2.6. CONCLUSIONS AND IMPLICATIONS 


Results from this study are not nearly as important because of what they 
show regarding the precision of estimate for Mission 43M as they are because 
of what they suggest as required analyses in order to assure future practical 
and economical applications of remote sensing. Some of these requirements 
are as follows: 

V 

(1) The statistical theory and model employed here are of the rather 
standard variety (only insignificantly modified for the application at hand) 
and make certain assumptions about the measurement error involved. These 
assumptions are to date untested and may or may not hold. Even if they hold, 
the results obtained here are at best an unbiased variance about the biased 
mean. Furthermore, the distribution of variance components in multistage 
sampling applications to remote sensing needs further study. The fact that in 
this study the hypothesis Sp = 0 could not be rejected does not rule out the 
possibility that the computed value for Sp was negative because of an irrelevant 
statistical model. 

(2) This study, in not permitting rejection of the hypothesis that 

Sp = 0, points out that we need to develop organized approaches to the use of 
a priori information. In retrospect, it appears obvious that, given the 
cropping pattern in the westernmost 44 mile wide strip of Indiana, the collection 
of sample data over 12 segments in one flightline should yield an estimate as 
precise as that obtained by collecting data over 4 segments in each of 3 
flightlines. But how can this be determined prior to the experiment? It is 
actively possible that the appropriate use of a priori information (e.g. census 
data) could provide the needed insights and basis for designing more efficient 
experiments. Perhaps an approach similar to the one used in "Project Chitter [6] 
would be fruitful. 

(3) Subsampling ratios and their effect on precision of estimates need 
to be examined. This study points out strikingly that there is the temptation 
to' oversample in some stages without resulting gains in precision (albeit 
with increasing costs of data storage and analysis). 

(4) To date we know nothing about the relationship between costs, 
subsampling ratios, and precision of estimates. Yet, it would seem less costly 
to collect data over twelve segments in one flightline than to collect data 
over four segments in each of three flightlines. But how much less costly, 
and what is the trade-off in precision? 

(5) Similarly, we know little about the technical and physical diffi- 
culty of collecting data in various ways. How much easier is it to collect 
ground truth on one flightline versus several flightlines? How much easier 
is it to collect "good" data over one flightline versus several? Given the 
presence of a broken cloud cover, what is the effect on the quality of data 
from a few large segments versus a large number of small segments? 
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(6) It is not possible to generalize from the results of this study to 
other applications. Instead, similar analyses are required for other types 
of applications (eg. , estimation of acres in corn at different times during 
the season, estimation of acres in other crops, estimation of degree of insect 
and disease infestation). 

(7) It is unlikely to be practical to develop a unique sampling scheme 
for each application. Instead, various applications may have to be viewed in 
terms of joint costs and joint products. Existing theories of joint costs and 
joint products and associated optimization procedures should be explored 

for their relevance. 

(8) If resources are limited (as they always are) , allocation of resources 
over time for taking samples (i.e. what time periods reflect important change) 
must also become an integral part of the analysis. For instance, changes over 
time in corn blight levels would, in all likelihood, affect the variance and 

the optimal sampling scheme. On the other hand, "acres in com" may not be 
affected by passage of time between planting and harvesting. 

(9) When remote sensing is done by aircraft, a sampling scheme such as 
the three-stage sampling scheme used in this analysis appears useful. How- 
ever, there is no a. priori reason why the same model should hold for remote 
sensing by satellite, when the satellite sequentially covers the entire region 
to be studied. Perhaps a simple random sample is more appropriate under such 
circumstances. Also, when time of overflight can no longer be controlled, the 
question of the extent to which a broken cloud cover can be used as the 
sample selection device becomes an interesting and important one. 
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APPENDIX 


Appendix Table 1. Mission 43M (August 9, 1971,) Multispectral Scanner Data from 

the Intensive Study Area. 


Segment No. 
(1) 

Michigan (M) 

or Points 

Purdue (P) in 

Segment Segment 

(2) (3) 

Pet. of 
Segment 
Classified 
as Com 
(4) 

Acres of 
Com 
(Ground 
Truth) 
(5) 

Planim. 

Acres 

of 

Se gmen t 
(6) 

Segment 

Area 

(Sq. miles 
(7) 

201 

M 

387890 

12.57 

1537 

79 70 

12.0 

202 

M 

301087 

15.56 

2191 

6569 

10.0 

203 

M 

298075 

8.31 

2831 

6 858 

11.5 

204 

M 

379556 

20. 86 

2892 

7720 

11.0 

205 

M 

332032 

19.00 

1888 

7 780 

12.0 

206 

P 

158885 

35.87 

2665 

5285 

9.0 

207 

P 

233153 

36.98 

3404 

79 73 

12.0 

208 

P 

225342 

44.59 

36 79 

7558 

12.0 

209 

P 

165708 

43.84 

2 324 

6059 

9.0 

210 

P 

130511 

21.67 

1092 

4790 

7.2 

211 

M 

289830 

11.80 

22 72 

6935 

10.5 

212 

M 

306366 

13.64 

2330 

7650 

9.5 

213 

M 

245800 

19.65 

1716 

6094 

10.0 

214 

M 

262840 

16.33 

1247 

5232 . 

9.0 

215 

P 

15446 7 

24.80 

864 

5750 

9.0 

216 

P 

207218 

18.31 

1278 

6932 

10.0 

217 

P 

246752 

26.37 

1758 

8030 

11.5 

218 

P 

208094 

26.02 

318 

7022 

10.5 

219 

P 

181745 

14.81 

996 

5946 

9.0 

220 

M 

244795 

8.96 

97 

5774 

8.5 

221 

P 

224446 

35.64 

2362 

5835 

9.0 

222 

M 

282812 

8.42 

338 

6000 

9.0 

223 

P 

194361 

20.64 

994 

6 749 

8.5 

224 

M 

2216 71 

2. 72 

201 

5120 

9.5 

225 

P 

195930 

27.42 

2125 

72 75 

10.5 

226 

M 

409600 

12.80 

887 

9121 

14.0 

227 

M 

264795 

23.42 

1490 

6774 

9.5 

228 

P 

91457 

60.44 

99 7 

385 7 

8.0 

229 

M 

261700 

29.86 

1684 

5855 

8.5 

230 

P 

161521 

21.23 

871 

5535 

8.5 


